class: inverse, center, middle name: hospital_pricing
Lots of different payers paying lots of different prices: - Medicare fee-for-service prices - Medicaid payments - Private insurance negotiations (including Medicare Advantage) - But what about the price to patients?
–
.center[ Price \(\neq\) charge \(\neq\) cost \(\neq\) patient out-of-pocket spending]
Not clear what exactly is negotiated…
| .pull-left[ ### Fee-for-service - price per procedure - percentage of charges - markup over Medicare rates] |
|---|
| .pull-right[ ### Capitation - payment per patient - pay-for-performance - shared savings] |
We’ll get into the real data in a bit, but for now…a few facts:
Hospital services are expensive
Prices vary dramatically across different areas
Lack of competition is a major reason for high prices
class: inverse, center, middle name: potential_outcomes
.pull-left[
Y(1)= $75,000
]
.pull-right[
Y(0)= ? ]
–Treatment effect =Y(1)-Y(0)= ?
class: inverse, center, middle name: selection_observables
More formally:
- \(E[Y(1)|W,shape]=E[Y(1)|shape]\) - \(Y(1),Y(0) \perp\!\!\!\perp W | shape\) |
|---|
| In words…nothing unobserved that determines treatment selection and affects your outcome of interest. |
Someone of each type must be in both the treated and untreated groups
| \[0 < \text{Pr}(W=1|X) <1\] |
|---|
| # Assumption 2: Common Support |
class: inverse, center, middle name: methods
We don’t observe the counterfactual outcome…what would have happened if a treated unit was actually untreated.
ALL attempts at causal inference represent some attempt at estimating the counterfactual outcome. We need an estimate for \(E[Y_{i}(0)]\) among those that were treated, and vice versa for \(E[Y_{i}(1)]\).
With selection on observables and common support: 1. Matching estimators: \[E[Y_{i}(1) - Y_{i}(0)] = E[ E[Y_{i}|W_{i}=1, X_{i}] - E[Y_{i}|W_{i}=0, X_{i}] ]\] 2. Regression estimators: \[E[Y_{i}(1) - Y_{i}(0)] = E[ E[Y_{i}|W_{i}=1, X_{i}] ] - E[ E[Y_{i}|W_{i}=0, X_{i}] ]\]
What’s the difference? |
|---|
| # Matching |
Impute \(\hat{Y}_{i}(0)\) and \(\hat{Y}_{i}(1)\) for each observation:
\[\hat{Y}_{i}(0) = \begin{cases}
Y_{i} & \text{if} & W_{i}=0 \\
\frac{1}{m} \sum_{j \in J_{m}(i)} Y_{j} & \text{if} & W_{i}=1
\end{cases}\] \[\hat{Y}_{i}(1) = \begin{cases}
Y_{i} & \text{if} & W_{i}=1 \\
\frac{1}{m} \sum_{j \in J_{m}(i)} Y_{j} & \text{if} & W_{i}=0
\end{cases}\]
Form “matched” ATE:
\(\hat{\delta}^{\text{match}} = \frac{1}{N} \sum_{i=1}^{N} \left(\hat{Y}_{i}(1) - \hat{Y}_{i}(0) \right)\)
.center[ ]
.center[ ]
nn.est1 <- Matching::Match(Y=select.dat$y,
Tr=select.dat$w,
X=select.dat$x,
M=1,
Weight=1,
estimand="ATE")
summary(nn.est1)
##
## Estimate... 4.0175
## AI SE...... 0.52954
## T-stat..... 7.5869
## p.val...... 3.2863e-14
##
## Original number of observations.............. 5000
## Original number of treated obs............... 1732
## Matched number of observations............... 5000
## Matched number of observations (unweighted). 5016
reg1.dat <- select.dat %>% filter(w==1)
reg1 <- lm(y ~ x, data=reg1.dat)
reg0.dat <- select.dat %>% filter(w==0)
reg0 <- lm(y ~ x, data=reg0.dat)
pred1 <- predict(reg1,new=select.dat)
pred0 <- predict(reg0,new=select.dat)
mean(pred1-pred0)
## [1] 4.076999
class: inverse, center, middle name: hcris
.center[ ]
hcris.data %>%
ggplot(aes(x=as.factor(year))) +
geom_bar() +
labs(
x="Year",
y="Number of Hospitals",
title="Number of Hospitals per Year"
) + theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust=1))
.plot-callout[ ]
hcris.data <- hcris.data %>%
mutate( discount_factor = 1-tot_discounts/tot_charges,
price_num = (ip_charges + icu_charges + ancillary_charges)*discount_factor - tot_mcare_payment,
price_denom = tot_discharges - mcare_discharges,
price = price_num/price_denom)
.left-code[
hcris.data %>% group_by(year) %>%
filter(price_denom>100, !is.na(price_denom),
price_num>0, !is.na(price_num),
price<100000) %>% #<<
select(price, year) %>%
summarize(mean_price=mean(price, na.rm=TRUE)) %>%
ggplot(aes(x=as.factor(year), y=mean_price)) +
geom_line(aes(group=1)) +
labs(
x="Year",
y="Average Hospital Price",
title="Hospital Prices per Year"
) + scale_y_continuous(labels=comma) +
theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust=1))
]
.right-plot[ ]
class: inverse, center, middle name: price_pfp
Always important to look at your data before doing any formal analysis. Ask yourself a few questions: 1. Are the magnitudes reasonable?
Are there lots of missing values?
Are there clear examples of misreporting?
We’ve adopted a very brute force way to deal with outlier prices. Other approaches include: 1. Investigate very closely the hospitals with extreme values
Winsorize at certain thresholds (replace extreme values with pre-determined thresholds)
Impute prices for extreme hospitals
Are penalized hospitals sufficiently similar to non-penalized hospitals?
Let’s look at covariate balance using a love plot, part of the library(cobalt) package. |
|---|
| # Love plots without adjustment |
r love.plot(bal.tab(lp.covs,treat=lp.vars$penalty), colors="black", shapes="circle", threshold=0.1) + theme_bw() + theme(legend.position="none") |
| .plot-callout[ |
m.exact <- Matching::Match(Y=lp.vars$price,
Tr=lp.vars$penalty,
X=lp.covs,
M=1,
exact=TRUE) #<<
print(m.exact)
## [1] NA
## attr(,"class")
## [1] "Match"
love.plot(bal.tab(m.exact, covs = lp.covs2, treat = lp.vars$penalty),
threshold=0.1,
grid=FALSE, sample.names=c("Unmatched", "Matched"),
position="top", shapes=c("circle","triangle"),
colors=c("black","blue")) +
theme_bw()
.plot-callout[ ]
m.nn.var <- Matching::Match(Y=lp.vars$price,
Tr=lp.vars$penalty,
X=lp.covs,
M=4, #<<
Weight=1,
estimand="ATE")
v.name=data.frame(new=c("Beds","Medicaid Discharges", "Inaptient Charges",
"Medicare Discharges", "Medicare Payments"))
love.plot(bal.tab(m.nn.var2, covs = lp.covs, treat = lp.vars$penalty),
threshold=0.1,
var.names=v.name,
grid=FALSE, sample.names=c("Unmatched", "Matched"),
position="top", shapes=c("circle","triangle"),
colors=c("black","blue")) +
theme_bw()
.plot-callout[ ]
m.nn.md <- Matching::Match(Y=lp.vars$price,
Tr=lp.vars$penalty,
X=lp.covs,
M=1,
Weight=2,
estimand="ATE")
love.plot(bal.tab(m.nn.ps, covs = lp.covs, treat = lp.vars$penalty),
threshold=0.1,
var.names=v.name,
grid=FALSE, sample.names=c("Unmatched", "Matched"),
position="top", shapes=c("circle","triangle"),
colors=c("black","blue")) +
theme_bw()
.plot-callout[ ]
##
## Estimate... -526.95
## AI SE...... 223.06
## T-stat..... -2.3623
## p.val...... 0.01816
##
## Original number of observations.............. 2707
## Original number of treated obs............... 698
## Matched number of observations............... 2707
## Matched number of observations (unweighted). 2711
##
## Estimate... -201.03
## AI SE...... 275.76
## T-stat..... -0.72898
## p.val...... 0.46601
##
## Original number of observations.............. 2707
## Original number of treated obs............... 698
## Matched number of observations............... 2707
## Matched number of observations (unweighted). 14795
ipw.reg <- lm(price ~ penalty, data=lp.vars, weights=ipw)
summary(ipw.reg)
##
## Call:
## lm(formula = price ~ penalty, data = lp.vars, weights = ipw)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -18691 -4802 -1422 2651 94137
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9876.4 147.8 66.808 <2e-16 ***
## penaltyTRUE -196.9 211.2 -0.932 0.351
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7829 on 2705 degrees of freedom
## Multiple R-squared: 0.0003211, Adjusted R-squared: -4.85e-05
## F-statistic: 0.8688 on 1 and 2705 DF, p-value: 0.3514
reg.dat <- lp.vars %>% ungroup() %>% filter(complete.cases(.)) %>%
mutate(beds_diff = penalty*(beds - mean(beds)),
mcaid_diff = penalty*(mcaid_discharges - mean(mcaid_discharges)),
ip_diff = penalty*(ip_charges - mean(ip_charges)),
mcare_diff = penalty*(mcare_discharges - mean(mcare_discharges)),
mpay_diff = penalty*(tot_mcare_payment - mean(tot_mcare_payment)))
reg <- lm(price ~ penalty + beds + mcaid_discharges + ip_charges + mcare_discharges + tot_mcare_payment +
beds_diff + mcaid_diff + ip_diff + mcare_diff + mpay_diff,
data=reg.dat)
class: inverse, center, middle name: summary
If we assume selection on observables holds, then we only need to condition on the relevant covariates to identify a causal effect. But we still need to ensure common support…
–
1. Matching 2. Reweighting 3. Regression